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Scientific  Progress 


Accomplishment  1:  “Is  Distrust  the  Negation  of  Trust?  The  Value  of  Distrust  in  Social  Media” 

•  Research  Problem  Studied:  Trust  plays  an  important  role  in  helping  online  users  collect  reliable  information,  and  has 
attracted  increasing  attention  in  recent  years.  We  learn  from  social  sciences  that,  as  the  conceptual  counterpart  of  trust,  distrust 
could  be  as  important  as  trust.  However,  little  work  exists  in  studying  distrust  in  social  media.  What  is  the  relationship  between 
trust  and  distrust?  Can  we  directly  apply  methodologies  from  social  sciences  to  study  distrust  in  social  media?  In  this  paper,  we 
design  two  computational  tasks  by  leveraging  data  mining  and  machine  learning  techniques  to  enable  the  computational 
understanding  of  distrust  with  social  media  data.  The  first  task  is  to  predict  distrust  from  only  trust,  and  the  second  task  is  to 
predict  trust  with  distrust.  We  conduct  experiments  in  real-world  social  media  data.  The  empirical  results  of  the  first  task  provide 
concrete  evidence  to  answer  the  question,  “is  distrust  the  negation  of  trust?”  while  the  results  of  the  second  task  help  us  figure 
out  how  valuable  the  use  of  distrust  in  trust  prediction 

•  Key  Contributions:  As  informed  by  social  sciences,  distrust  could  be  as  important  as  trust.  A  fundamental  problem  about 
distrust  is  what  the  relation  between  trust  and  distrust  is.  Passive  observation  is  the  modus  operandi  to  obtain  social  media 
data,  which  lacks  necessary  information  to  apply  methodologies  from  social  sciences  to  understand  distrust.  However,  an 
understanding  of  distrust  with  social  media  data  is  necessary  because  if  distrust  is  the  negation  of  trust,  lacking  distrust  study 
matters  little;  while  if  distrust  is  a  new  dimension  of  trust,  ignoring  distrust  in  trust  study  may  yield  an  incomplete  and  biased 
estimate  of  the  effects  of  trust.  In  this  paper,  we  first  investigate  the  properties  of  distrust  and  find  that  we  cannot  equally  and 
conversely  extend  the  properties  of  trust  to  distrust.  Then  we  then  design  two  tasks  by  leveraging  data  mining  and  machine 
learning  techniques  to  enable  a  computational  understanding  of  distrust  with  social  media  data.  The  first  task  is  to  predict 
distrust  with  only  trust  information,  and  the  second  task  is  to  predict  trust  with  distrust  information.  We  conduct  experiments  in 
real-world  social  media  data.  The  evaluations  of  the  first  task  suggests  that  distrust  is  not  the  negation  of  trust,  while  the  results 
of  the  second  task  reveal  that  distrust  has  added  value  over  trust. 

Accomplishment  2:  “Scalable  Learning  of  Users'  Preferences  Using  Networked  Data” 

•  Research  Problem  Studied:  Users’  personal  information  such  as  their  political  views  is  important  for  many  applications  such 
as  targeted  advertisements  or  real-  time  monitoring  of  political  opinions.  Huge  amounts  of  data  generated  by  social  media  users 
present  opportunities  and  challenge  to  study  these  preferences  in  a  large  scale.  In  this  paper,  we  aim  to  infer  social  media 
users’  political  views  when  only  network  information  is  available.  In  particular,  given  personal  preferences  about  some  of  the 
social  media  users,  how  can  we  infer  the  preferences  of  unobserved  individuals  in  the  same  network?  There  are  many  existing 
solutions  that  address  the  problem  of  classification  with  networked  data  problem.  However,  networks  in  social  media  normally 
involve  millions  and  even  hundreds  of  millions  of  nodes,  which  make  the  scalability  an  important  problem  in  inferring  personal 
preferences  in  social  media.  To  address  the  scalability  issue,  we  use  social  influence  theory  to  construct  new  features  based  on 
a  combination  of  local  and  global  structures  of  the  network.  Then  we  use  these  features  to  train  classifiers  and  predict  users’ 
preferences.  Due  to  the  size  of  real-world  social  networks,  using  the  entire  net-  work  information  is  inefficient  and  not  practical 
in  many  cases.  By  extracting  local  social  dimensions,  we  present  an  efficient  and  seal-  able  solution.  Further,  by  capturing  the 
network’s  global  pattern,  the  proposed  solution,  balances  the  performance  requirement  between  accuracy  and  efficiency 

•  Key  Contributions:  In  this  paper  we  studied  the  network-based  approach  of  inferring  users’  personal  preferences.  We 
categorized  the  network-based  algorithms  into  local  and  global  algorithms.  Local  algorithms  use  users’  neighbors  to  predict  their 
preferences,  while  the  global  approaches  use  the  entire  network  information  to  predict  user’s  preferences.  Our  experimental 
results  show  that  local  algorithms  are  fast  and  scalable;  however  they  need  large  amount  of  labeled  data  to  achieve  reasonable 
prediction  accuracy.  Further  their  prediction  accuracy  is  always  less  than  the  accuracy  of  global  algorithms.  Global  algorithms, 

in  contrast,  are  computationally  expensive,  but  perform  well  even  in  cases  where  only  a  very  small  fraction  of  the  data  is 
labeled.  We  proposed  a  new  algorithm  called  LSocDim  based  on  social  influence  theory  to  bridge  the  efficiency  of  local 
algorithms  and  the  accuracy  of  global  algorithms.  The  experiments  show  the  efficiency  and  the  effectiveness  of  the  proposed 
algorithm.  In  particular,  we  show  that  LSocDim  achieves  a  prediction  accuracy  near  to  that  of  the  state-of-the-art  global 
algorithm,  SoCDim,  while  decreasing  the  running  time  by  up  to  40  times. 

Accomplishment  3:  “Leveraging  Knowledge  across  Media  for  Spammer  Detection  in  Microblogging” 

•  Research  Problem  Studied:  While  microblogging  has  emerged  as  an  important  information  sharing  and  communication 
platform,  it  has  also  become  a  convenient  venue  for  spammers  to  overwhelm  other  users  with  unwanted  content.  Currently, 
spammer  detection  in  microblogging  focuses  on  using  social  networking  information,  but  little  on  content  analysis  due  to  the 
distinct  nature  of  microblogging  messages.  First,  label  information  is  hard  to  obtain.  Second,  the  texts  in  microblogging  are  short 
and  noisy.  As  we  know,  spammer  detection  has  been  extensively  studied  for  years  in  various  media,  e.g.,  emails,  SMS  and  the 
web.  Motivated  by  abundant  resources  available  in  the  other  media,  we  investigate  whether  we  can  take  advantage  of  the 
existing  resources  for  spammer  detection  in  microblogging.  While  people  accept  that  texts  in  microblogging  are  different  from 
those  in  other  media,  there  is  no  quantitative  analysis  to  show  how  different  they  are.  In  this  paper,  we  first  perform  a 
comprehensive  linguistic  study  to  com-  pare  spam  across  different  media.  Inspired  by  the  findings,  we  present  an  optimization 
formulation  that  enables  the  design  of  spammer  detection  in  microblogging  using  knowledge  from  external  media.  We  conduct 
experiments  on  real-world  Twitter  datasets  to  verify  (1)  whether  email,  SMS  and  web  spam  resources  help  and  (2)  how  different 
media  help  for  spammer  detection  in  microblogging. 

•  Key  Contributions:  Texts  in  microblogging  are  short,  noisy,  and  labeling  processing  is  time-consuming  and  labor-intensive, 
which  presents  great  challenges  for  spammer  detection.  In  this  paper,  we  first  conduct  a  quantitative  analysis  to  study  how 
noisy  the  microblogging  texts  are  by  comparing  them  with  spam  messages  from  other  media.  The  results  suggest  that 
microblogging  data  is  not  significantly  different  from  data  from  the  other  media.  Based  on  the  observations,  a  matrix 
factorization  model  is  employed  to  learn  lexicon  information  from  external  spam  resources.  By  incorporating  external 


information  from  other  media  and  content  information  from  microblogging,  we  propose  a  novel  framework  for  spammer 
detection.  The  experimental  results  demonstrate  the  effectiveness  of  our  proposed  model  as  well  as  the  roles  of  different  types 
of  information  in  spammer  detection. 

Accomplishment  4:  “Online  Social  Spammer  Detection” 

•  Research  Problem  Studied:  The  explosive  use  of  social  media  also  makes  it  a  popular  platform  for  malicious  users,  known 
as  social  spammers,  to  overwhelm  normal  users  with  unwanted  content.  One  effective  way  for  social  spammer  detection  is  to 
build  a  classifier  based  on  content  and  social  network  information.  However,  social  spammers  are  sophisticated  and  adaptable 
to  game  the  system  with  fast  evolving  content  and  network  patterns.  First,  social  spammers  continually  change  their  spamming 
content  patterns  to  avoid  being  detected.  Second,  reflexive  reciprocity  makes  it  easier  for  social  spammers  to  establish  social 
influence  and  pretend  to  be  normal  users  by  quickly  accumulating  a  large  number  of  “human”  friends.  It  is  challenging  for 
existing  anti-spamming  systems  based  on  batch-mode  learning  to  quickly  respond  to  newly  emerging  patterns  for  effective 
social  spammer  detection.  In  this  paper,  we  present  a  general  optimization  framework  to  collectively  use  content  and  network 
information  for  social  spammer  detection  and  pro-  vide  the  solution  for  efficient  online  processing.  Experimental  results  on 
Twitter  datasets  confirm  the  effectiveness  and  efficiency  of  the  proposed  framework. 

•  Key  Contributions:  Social  spammers  are  sophisticated  and  adaptable  to  game  the  system  by  continually  change  their 
content  and  network  patterns.  To  handle  fast  evolving  social  spammers,  we  proposed  to  use  online  learning  to  efficiently  reflect 
the  newly  emerging  patterns.  In  this  paper,  we  develop  a  general  social  spammer  detection  framework  with  both  content  and 
network  information,  and  provide  its  online  learning  updating  rules.  In  particular,  we  use  directed  graph  Laplacian  to  model 
social  network  information,  which  is  further  integrated  into  a  matrix  factorization  framework  for  content  information  modeling.  By 
investigating  its  online  updating  scheme,  we  provide  an  efficient  way  for  social  spammer  detection.  Experimental  results  show 
that  our  proposed  method  is  effective  and  efficient  comparing  with  other  social  spammer  detection  methods. 

Accomplishment  5:  “Mining  Social  Media  with  Social  Theories:  A  Survey” 

•  Research  Problem  Studied:  The  increasing  popularity  of  social  media  encourages  more  and  more  users  to  participate  in 
various  online  activities  and  produces  data  in  an  unprecedented  rate.  Social  media  data  is  big,  linked,  noisy,  highly  unstructured 
and  in-  complete,  and  differs  from  data  in  traditional  data  mining,  which  cultivates  a  new  research  field  -  social  media  mining. 
Social  theories  from  social  sciences  are  helpful  to  explain  social  phenomena.  The  scale  and  properties  of  social  media  data  are 
very  different  from  these  of  data  social  sciences  use  to  develop  social  theories.  As  a  new  type  of  social  data,  social  media  data 
has  a  fundamental  question  -  can  we  ap-  ply  social  theories  to  social  media  data?  Recent  advances  in  computer  science 
provide  necessary  computational  tools  and  techniques  for  us  to  verify  social  theories  on  large-scale  social  media  data.  Social 
theories  have  been  applied  to  mining  social  media.  In  this  article,  we  review  some  key  social  theories  in  mining  social  media, 
their  verification  approaches,  interesting  findings,  and  state-of-the-art  algorithms.  We  also  discuss  some  future  directions  in  this 
active  area  of  mining  social  media  with  social  theories. 

•  Key  Contributions:  The  social  nature  of  social  media  data  calls  for  new  techniques  and  tools  and  cultivates  a  new  field  - 
social  media  mining.  Social  theories  from  social  sciences  have  been  proven  to  be  applicable  to  mining  social  media.  Integrating 
social  theories  with  computational  models  is  becoming  an  interesting  way  in  mining  social  media  data  and  makes  exciting 
progress  in  various  social  media  mining  tasks.  In  this  article,  we  review  three  key  social  theories,  i.e.,  social  correlation  theory, 
balance  theory  and  status  theory,  in  mining  social  media  data.  In  detail,  we  introduce  basic  concepts,  verification  methods, 
interesting  findings  and  the  state-of  -  the-art  algorithms  to  exploit  these  social  theories  in  social  media  mining  tasks,  which  can 
be  categorized  to  feature  engineering,  constraint  generating  and  objective  defining. 

Accomplishment  6:  “Social  Recommendation:  A  Review” 

•  Research  Problem  Studied:  Recommender  systems  play  an  important  role  in  helping  online  users  find  relevant  information 
by  suggesting  information  of  potential  interest  to  them.  Due  to  the  potential  value  of  social  relations  in  recommender  systems, 
social  recommendation  has  attracted  increasing  attention  in  recent  years.  In  this  paper,  we  present  a  review  of  existing 
recommender  systems  and  discuss  some  research  directions.  We  begin  by  giving  formal  definitions  of  social  recommendation 
and  discuss  the  unique  property  of  social  recommendation  and  its  implications  compared  with  those  of  traditional  recommender 
systems.  Then,  we  classify  existing  social  recommender  systems  into  memory-based  social  recommender  systems  and  model- 
based  social  recommender  systems,  according  to  the  basic  models  ad  opted  to  build  the  systems,  and  review  representative 
systems  for  each  category.  We  also  present  some  key  findings  from  both  positive  and  negative  experiences  in  building  social 
recommender  systems,  and  research  directions  to  improve  social  recommendation  capabilities 

•  Key  Contributions:  Social  recommendation  has  attracted  broad  attention  from  both  academia  and  industry,  and  many  social 
recommender  systems  have  been  proposed  in  recent  years.  In  this  paper,  we  first  give  a  narrow  definition  and  a  bro  ad 
definition  of  social  recommendation  to  cover  most  existing  definitions  of  social  recommendation  in  literature,  and  discuss  the 
unique  feature  of  social  recommender  systems  as  well  as  its  implications.  We  classify  current  social  recommender  systems  into 
memory-based  social  recommender  systems  and  model-based  social  recommender  systems  according  to  the  basic  models 
chosen  to  build  the  systems,  and  then  present  a  re-  view  of  representative  systems  for  each  category.  We  also  discuss  some 
key  findings  from  positive  and  negative  experiences  in  applying  social  recommender  systems.  Social  recommendation  is  still  in 
the  early  stages  of  development  and  needs  further  improvement.  Finally  we  present  research  directions  that  can  potentially 
improve  performance  of  social  recommender  systems  including  exploiting  the  heterogeneity  of  social  networks  and  weak 
dependence  connections,  microcosmic  investigation  of  users  and  items,  considering  temporal  information  in  rating  and  social 
information,  understanding  the  role  of  negative  relations,  and  integrating  cross-media  data. 

Accomplishment  7:  "Exploiting  Homophily  Effect  for  Trust  Prediction"(hTrust) 

•  Research  Problem  Studied:  Trust  plays  a  crucial  role  for  online  users  who  seek  reliable  information.  However,  in  reality. 


user-specified  trust  relations  are  very  sparse,  i.e.,  a  tiny  number  of  pairs  of  users  with  trust  relations  are  buried  in  a 
disproportionately  large  number  of  pairs  without  trust  relations,  making  trust  prediction  a  daunting  task.  As  an  important  social 
concept,  however,  trust  has  received  growing  attention  and  interest.  Social  theories  are  developed  for  understanding  trust. 
Homophily  is  one  of  the  most  important  theories  that  explain  why  trust  relations  are  established.  Exploiting  the  homophily  effect 
for  trust  prediction  provides  challenges  and  opportunities.  In  this  paper,  we  embark  on  the  challenges  to  investigate  the  trust 
prediction  problem  with  the  homophily  effect.  First,  we  delineate  how  it  differs  from  existing  approaches  to  trust  prediction  in  an 
unsupervised  setting.  Next,  we  formulate  the  new  trust  prediction  problem  into  an  optimization  problem  integrated  with 
homophily,  empirically  evaluate  our  approach  on  two  datasets  from  real-world  product  review  sites,  and  compare  with 
representative  algorithms  to  gain  a  deep  understanding  of  the  role  of  homophily  in  trust  prediction. 

•  Key  Contributions:  In  this  paper,  we  study  the  problem  of  exploiting  homophily  effect  for  trust  prediction.  First  we  conduct 
experiments  on  datasets  from  real-world  product  review  sites  to  demonstrate  the  existence  of  homophily  in  trust  relations. 
Homophily  regularization  is  then  introduced  to  capture  homophily  effect  in  trust  relations.  An  unsupervised  framework  is 
proposed,  incorporating  low-rank  matrix  factorization  and  homophily  regularization.  Extensive  experiments  are  conducted  to 
evaluate  the  proposed  framework  on  real-world  trust  relation  datasets  and  the  experimental  results  demonstrate  the 
effectiveness  of  our  proposed  framework  as  well  as  the  role  of  homophily  regularization  for  trust  prediction. 

Accomplishment  8:  "Exploiting  Local  and  Global  Social  Context  for  Recommendation" 

•  Research  Problem  Studied:  With  the  fast  development  of  social  media,  the  information  overload  problem  becomes 
increasingly  severe  and  recommender  systems  play  an  important  role  in  helping  online  users  find  relevant  information  by 
suggesting  information  of  potential  interests.  Social  activities  for  online  users  produce  abundant  social  relations.  Social  relations 
provide  an  independent  source  for  recommendation,  presenting  both  opportunities  and  challenges  for  traditional  recommender 
systems.  Users  are  likely  to  seek  suggestions  from  both  their  local  friends  and  users  with  high  global  reputations,  motivating  us 
to  exploit  social  relations  from  local  and  global  perspectives  for  online  recommender  systems  in  this  paper.  We  develop 
approaches  to  capture  local  and  global  social  relations,  and  propose  a  novel  frame-  work  LOCABAL  taking  advantage  of  both 
local  and  global  social  context  for  recommendation.  Empirical  results  on  real-world  datasets  demonstrate  the  effectiveness  of 
our  proposed  framework  and  further  experiments  are  conducted  to  understand  how  local  and  global  social  context  work  for  the 
proposed  framework. 

•  Key  Contributions:  The  availability  of  social  relations  presents  both  challenges  and  opportunities  for  traditional 
recommender  systems.  In  this  paper,  we  investigate  how  to  exploit  local  and  global  social  context  for  recommendation.  To 
capture  local  social  context,  we  force  that  the  user  preferences  of  two  socially  connected  users  are  correlated  as  suggested  by 
social  correlation  theories  and  we  also  study  the  connections  between  our  proposed  approach  and  existing  approaches. 

Ratings  from  users  with  high  reputations  are  more  likely  to  be  trustworthy;  therefore,  to  capture  global  social  context,  we  use 
user  reputation  scores  to  weight  the  importance  of  their  ratings.  With  these  solutions,  we  propose  a  framework  LOCABAL  to 
integrate  local  and  global  social  context  for  recommendation.  Experimental  results  on  real-world  data  sets  show  that  the 
proposed  framework  LOCABAL  outperforms  representative  social  recommender  systems.  Further  experiments  are  conducted 
to  understand  the  working  of  LOCABAL. 

Accomplishment  9:  "A  Tool  for  Collecting  Provenance  Data  in  Social  Media" 

•  Research  Problem  Studied:  In  recent  years,  social  media  sites  have  provided  a  large  amount  of  information.  Recipients  of 
such  information  need  mechanisms  to  know  more  about  the  received  information,  including  the  provenance.  Previous  research 
has  shown  that  some  attributes  related  to  the  received  information  provide  additional  context,  so  that  recipient  can  assess  the 
amount  of  value,  trust,  and  validity  to  be  placed  in  the  received  information.  Personal  attributes  of  a  user,  including  name, 
location,  education,  ethnicity,  gender,  and  political  and  religious  affiliations,  can  be  found  in  social  media  sites.  In  this  paper,  we 
present  a  novel  web-based  tool  for  collecting  the  attributes  of  interest  associated  with  a  particular  social  media  user  related  to 
the  received  information.  This  tool  provides  a  way  to  combine  different  attributes  available  at  different  social  media  sites  into  a 
single  user  profile.  Using  different  types  of  Twitter  users,  we  also  evaluate  the  performance  of  the  tool  in  terms  of  number  of 
attribute  values  collected,  validity  of  these  values,  and  total  amount  of  retrieval  time. 

•  Key  Contributions:  The  provenance  data  collector  tool  aims  to  collect  provenance  attribute  values  of  a  user.  By  collecting 
such  values  of  a  user  related  to  the  received  information,  the  tool  could  facilitate  recipients  to  understand  more  about  the 
received  information.  Data  generated  on  social  media  sites  is  largely  distributed  and  unstructured  in  nature.  The  proposed  tool 
also  provides  a  way  to  combine  such  distributed  and  unstructured  social  media  data. 

Accomplishment  10:  "Recovering  Information  Recipients  in  Social  Media  via  Provenance" 

•  Research  Problem  Studied:  In  recent  years,  social  media  has  changed  the  way  we  interact  and  communicate.  Although  the 
existing  structure  of  social  media  allows  users  to  easily  create,  receive,  and  propagate  pieces  of  information,  many  a  time, 
users  do  not  have  background  knowledge  about  the  received  information,  including  the  provenance  (sources  or  originators)  of 
information,  and  other  recipients  who  may  have  retransmitted  or  modified  the  information.  Providing  such  additional  context  to 
the  received  information  can  help  users  know  how  much  value,  trust,  and  validity  should  be  placed  in  received  information.  To 
judge  the  credibility  of  the  received  piece  of  information,  it  is  vital  to  know  who  are  its  sources,  and  how  information  propagates 
from  sources  to  other  social  media  users.  In  this  paper,  we  are  studying  a  novel  research  problem  that  facilitates  a  few  known 
recipients  to  recover  other  unknown  recipients,  and  seek  the  provenance  of  information.  The  experimental  results  with 
Facebook  and  Twitter  datasets  show  that  the  proposed  algorithm  is  effective  in  correctly  recovering  the  unknown  recipients  and 
seeking  the  provenance  of  information. 

•  Key  Contributions:  Social  media  allows  its  users  to  share  a  vast  amount  of  information  with  other  users,  but  it  provides  no 
mechanism  to  know  more  about  the  received  information  for  its  users.  In  this  paper,  we  aim  to  recover  information  recipients  as 


well  as  seek  the  provenance  by  knowing  a  few  nodes  and  using  only  link  information  in  social  networks.  Information  recipients 
exist  along  the  paths  from  the  sources  to  the  known  nodes.  In  this  paper  we  seek  the  information  propagation  flow  from  the 
sources  to  the  known  nodes,  and  recover  the  most  likely  information  recipients.  Using  the  experiment  results  from  the  Facebook 
and  Twitter  datasets,  we  show  that  the  proposed  algorithm  is  effective  in  correctly  recovering  the  information  recipients  and 
seeking  the  provenance  of  information. 

Accomplishment  11:  "Context-Aware  Review  Helpfulness  Rating  Prediction" 

•  Research  Problem  Studied:  Online  reviews  play  a  vital  role  in  the  decision-making  process  for  online  users.  Helpful  reviews 
are  usually  buried  in  a  large  number  of  unhelpful  reviews,  and  with  the  consistently  increasing  number  of  reviews,  it  becomes 
more  and  more  difficult  for  online  users  to  find  helpful  reviews.  Therefore  most  online  review  websites  allow  online  users  to  rate 
the  helpfulness  of  a  review  and  a  global  helpfulness  score  is  computed  for  the  review  based  on  its  available  ratings.  However, 

in  reality,  user-specified  helpfulness  ratings  for  reviews  are  very  sparse  a  few  reviews  attract  large  numbers  of  helpfulness 
ratings  while  most  reviews  obtain  few  or  even  no  helpfulness  ratings.  The  available  helpfulness  ratings  are  too  sparse  for 
online  users  to  assess  the  helpfulness  of  reviews.  Also  the  helpfulness  of  a  review  is  not  necessarily  equally  useful  for  all  users 
and  users  with  different  background  may  treat  the  helpfulness  of  a  review  very  differently.  The  user  idiosyncrasy  of  review 
helpfulness  motivates  us  to  study  the  problem  of  review  helpfulness  rating  prediction  in  this  paper.  We  first  identify  various 
types  of  context  information,  model  them  mathematically,  and  propose  a  context-aware  review  helpfulness  rating  prediction 
framework  CAP.  Experimental  results  demonstrate  the  effectiveness  of  the  proposed  framework  and  the  importance  of  context 
awareness  in  solving  the  review  helpfulness  rating  prediction  problem. 

•  Key  Contributions:  In  this  paper  we  study  the  problem  of  review  helpfulness  rating  prediction  by  exploiting  context 
awareness  to  infer  unknown  helpfulness  ratings  automatically,  motivated  by  the  fact  that  helpful  reviews  can  be  buried  in  large 
amounts  of  useless  reviews  and  the  user-specific  helpfulness  ratings  are  too  sparse  for  online  users  to  assess  the  helpfulness 
of  reviews.  We  first  show  that  the  problem  we  study  differs  from  review  quality  prediction  problem  and  the  item  rating 
prediction  problem.  We  extract  four  types  of  social  context,  i.e.,  author  context,  rater  context,  connection  context  and 
preference  context,  formulate  them  mathematically,  and  propose  a  context-aware  helpfulness  prediction  framework  CAP  which 
exploits  content  context  and  various  types  of  social  context.  Experimental  results  demonstrate  that  our  proposed  framework 
outperforms  the  state-of-the-art  baseline  methods  with  both  cold-start  and  warm-start  settings,  and  further  experiments  are 
conducted  to  understand  the  importance  of  context  awareness  in  the  proposed  framework. 

Accomplishment  12:  "Seeking  Provenance  of  Information  in  Social  Media" 

•  Research  Problem  Studied:  Social  media  has  profoundly  impacted  the  way  people  interact  and  communicate.  Social  media 
propagates  breaking  news  and  disinformation  alike  fast  and  on  an  unsurpassed  scale.  Because  of  its  democratization  nature, 
social  media  users  can  easily  produce,  receive  and  propagate  a  piece  of  information  without  necessarily  providing  traceable 
information.  Thus,  there  are  no  means  for  a  user  to  verify  the  provenance  (also  known  as,  sources  or  originators)  of  information. 
The  disinformation  can  cause  tragic  consequences  to  society  and  individuals.  This  work  aims  to  take  advantage  of 
characteristics  of  social  media  to  provide  a  solution  to  the  problem  of  lacking  traceable  information.  Such  knowledge  can 
provide  additional  context  to  the  received  information  such  that  a  user  can  assess  how  much  value,  trust,  and  validity  should  be 
placed  in  received  information.  In  this  paper,  we  are  studying  a  novel  research  problem  that  facilitates  a  few  known  recipients 
(less  than  1%  of  the  total  recipients)  to  seek  the  provenance  of  information  by  recovering  how  it  has  own  from  its  originators. 

The  proposed  methodology  exploits  easily  computable  node  centralities  of  a  large  social  media  network.  The  experimental 
results  with  Facebook  and  Twitter  datasets  show  that  the  proposed  mechanism  is  effective  in  correctly  identifying  the  additional 
recipients  and  seeking  the  provenance  of  information. 

•  Key  Contributions:  Social  media  allows  its  users  to  share  vast  amount  of  information  with  other  users,  but  it  lacks 
mechanisms  that  provide  traceable  knowledge  about  the  received  information  for  its  users.  In  this  paper,  we  study  a  novel 
research  problem  that  facilitates  a  few  P-nodes  (less  than  1%  of  total  recipients)  to  seek  the  provenance  of  information  by 
identifying  how  it  has  own  from  its  originators.  To  this  end,  we  first  formally  present  the  problem  and  provide  the  complexity 
analysis.  Then,  use  the  Facebook  and  Twitter  datasets  to  show  the  existence  of  two  hypotheses:  Degree  Propensity  and 
Closeness  Propensity.  The  proposed  methodology  then  exploits  these  hypotheses  to  provide  not  only  the  critical  information 
about  the  provenance,  but  also  the  most  likely  provenance  paths.  Finally  using  the  experimental  results  with  the  Facebook  and 
Twitter  datasets,  we  show  that  the  proposed  algorithm  is  effective  in  correctly  identifying  the  additional  transmitters,  and  seeking 
the  provenance  of  information. 

Accomplishment  13:  "A  Tool  for  Assisting  Provenance  Search  in  Social  Media" 

•  Research  Problem  Studied:  In  recent  years,  social  media  sites  are  witnessing  an  information  explosion.  Determining  the 
reliability  of  such  a  large  amount  of  information  is  a  major  area  of  research.  Information  provenance  (aka,  sources  or  origin) 
provides  a  way  to  measure  the  reliability  of  information  in  social  networks.  The  main  challenge  in  seeking  provenance  is  the 
availability  of  suitable  data  consisting  of  sufficient  unique  propagation  paths.  Current  research  on  provenance  in  social  media 
uses  synthetically  generated  propagation  paths.  Although  these  proposed  approaches  are  theoretically  significant,  it  is  still  a 
challenge  to  apply  and  evaluate  them  on  social  media.  Hence,  knowledge  of  the  actual  propagation  paths  for  a  piece  of 
information  will  be  a  valuable  asset  in  provenance  search.  This  paper  presents  a  tool  for  capturing  the  propagation  network  of  a 
given  tweet  or  URL  (Uniform  Resource  Locator)  in  the  Twitter  network.  Researchers  can  use  this  tool  to  collect  information 
propagation  data,  design  effective  strategies  for  determining  the  provenance,  and  gain  information  about  the  tweet  such  as 
impact,  growth  rate  and  users  influencing  the  spread.  An  overview  of  the  user  interface  and  the  architecture  of  the  system  is 
provided.  Two  case  studies,  one  relating  to  disinformation  in  riot  situations  and  another  on  corporate  involvement  in  education 
has  been  presented  to  demonstrate  the  effectiveness  of  the  system  for  seeking  provenance  information. 


•  Key  Contributions:  The  paper  presents  a  tooi  to  obtain  the  spread  of  a  given  tweet  or  URL  on  the  twitter  network.  The  tooi 
presents  researchers  with  a  propagation  network  to  assist  in  seeking  the  provenance  path  of  a  given  tweet.  The  provenance 
path  gives  additionai  information  to  assess  the  reiiabiiity  of  a  given  piece  of  data  in  sociai  media. 

Accompiishment  14:  "Provenance  Data  in  Sociai  Media" 

•  Book  Overview:  Sociai  media  shatters  the  barrier  to  communicate  anytime  anywhere  for  peopie  of  aii  waiks  of  iife.  The 
pubiiciy  avaiiabie,  virtuaiiy  free  information  in  sociai  media  poses  a  new  chaiienge  to  consumers  who  have  to  discern  whether  a 
piece  of  information  pubiished  in  sociai  media  is  reiiabie.  For  exampie,  it  can  be  difficuit  to  understand  the  motivations  behind  a 
statement  passed  from  one  user  to  another,  without  knowing  the  person  who  originated  the  message.  Additionaiiy,  faise 
information  can  be  propagated  through  sociai  media,  resuiting  in  embarrassment  or  irreversibie  damages.  Provenance  data 
associated  with  a  sociai  media  statement  can  heip  dispei  rumors,  ciarify  opinions,  and  confirm  facts.  However,  provenance  data 
about  sociai  media  statements  is  not  readiiy  avaiiabie  to  users  today.  Currentiy,  providing  this  data  to  users  requires  changing 
the  sociai  media  infrastructure  or  offering  subscription  services.  Taking  advantage  of  sociai  media  features,  research  in  this 
nascent  fieid  spearheads  the  search  for  a  way  to  provide  provenance  data  to  sociai  media  users,  thus  ieveraging  sociai  media 
itseif  by  mining  it  for  the  provenance  data.  Searching  for  provenance  data  reveais  an  interesting  probiem  space  requiring  the 
deveiopment  and  appiication  of  new  metrics  in  order  to  provide  meaningfui  provenance  data  to  sociai  media  users.  This  iecture 
reviews  the  current  research  on  information  provenance,  expiores  exciting  research  opportunities  to  address  pressing  needs, 
and  shows  how  data  mining  can  enabie  a  sociai  media  user  to  make  informed  judgements  about  statements  pubiished  in  sociai 
media. 

•  Tabie  of  Contents:  Information  Provenance  in  Sociai  Media  /  Provenance  Attributes  /  Provenance  via  Network  Information  / 
Provenance  Data 

Accompiishment  15:  "UserVuinerabiiity  and  its  Reduction  on  a  Sociai  Networking  Site" 

•  Research  Probiem  Studied:  Privacy  and  security  are  major  concerns  for  many  users  of  sociai  media.  When  users  share 
information  (e.g.,  data  and  photos)  with  friends,  they  can  make  their  friends  vuinerabie  to  security  and  privacy  breaches  with 
dire  consequences.  With  the  continuous  expansion  of  a  user’s  sociai  network,  privacy  settings  aione  are  often  inadequate  to 
protect  user’s  profiie.  In  this  research,  we  aim  to  address  some  criticai  issues  reiated  to  privacy  protection:  (1)  How  can  we 
measure  and  assess  individuai  user’s  vuinerabiiity?  (2)  With  the  diversity  of  one’s  sociai  network  friends,  how  can  one  figure  out 
an  effective  approach  to  maintaining  baiance  between  vuinerabiiity  and  sociai  utiiity?  In  this  work,  first  we  present  a  novei  way 
to  define  vuinerabie  friends  from  an  individuai  user’s  perspective.  User  vuinerabiiity  is  dependent  on  whether  or  not  the  user’s 
friends’  privacy  settings  protect  the  friend  and  the  individuai’s  network  of  friends  (which  inciudes  the  user).  We  show  that  it  is 
feasibie  to  measure  and  assess  user  vuinerabiiity,  and  reduce  one’s  vuinerabiiity  without  changing  the  structure  of  a  sociai 
networking  site.  The  approach  is  to  unfriend  one’s  most  vuinerabie  friends.  However,  when  such  a  vuinerabie  friend  is  aiso 
sociaiiy  important,  unfriending  him  wouid  significantiy  reduce  one’s  own  sociai  status.  We  formuiate  this  novei  probiem  as 
vuinerabiiity  minimization  with  sociai  utiiity  constraints.  We  formaiiy  define  the  optimization  probiem,  and  provide  an 
approximation  aigorithm  with  a  proven  bound.  Finaiiy,  we  conduct  a  iarge-scaie  evaiuation  of  new  framework  using  a  Facebook 
dataset.  We  resort  to  experiments  and  observe  how  much  vuinerabiiity  an  individuai  user  can  decrease  by  unfriending  a 
vuinerabie  friend.  We  compare  performance  of  different  unfriending  strategies  and  discuss  the  security  risk  of  new  friend 
request.  Additionaiiy,  by  empioying  different  forms  of  sociai  utiiity,  we  confirm  that  baiance  between  user  vuinerabiiity  and  sociai 
utiiity  can  be  practicaiiy  achieved. 

•  Key  Contributions:  We  propose  a  feasibie  approach  to  a  novei  probiem  of  identifying  a  user’s  vuinerabie  friends  on  a  sociai 
networking  site.  Our  work  differs  from  existing  work  addressing  sociai  networking  privacy  by  introducing  a  vuinerabiiity-centered 
approach  to  a  user  security  on  a  sociai  networking  site.  On  most  sociai  networking  sites,  privacy  reiated  efforts  have  been 
concentrated  on  protecting  individuai  attributes  oniy.  However,  users  are  often  vuinerabie  through  community  attributes. 
Unfriending  vuinerabie  friends  can  heip  protect  users  against  the  security  risks.  Based  on  our  study  of  over  2  miiiion  users,  we 
find  that  users  are  either  not  carefui  or  not  aware  of  security  and  privacy  concerns  of  their  friends.  Our  modei  cieariy  highiights 
the  impact  of  each  new  friend  on  a  user’s  privacy.  Our  approach  does  not  require  the  structurai  change  of  a  sociai  networking 
site  and  aims  to  maximaiiy  reduce  a  user’s  vuinerabiiity  whiie  minimizing  his  sociai  utiiity  ioss.  The  work  formuiates  a  novei 
probiem  of  constrained  vuinerabiiity  reduction  suggests  a  feasibie  approach,  and  demonstrates  that  the  probiem  of  constrained 
vuinerabiiity  reduction  is  soivabie. 

Accompiishment  16:  “mTrust:  Discerning  Muiti-Faceted  Trust  in  a  Connected  Worid” 

•  Research  Probiem  Studied:  The  issue  of  trust  has  attracted  increasing  attention  from  the  community  of  sociai  media 
research.  Trust,  as  a  sociai  concept,  naturaiiy  has  muitipie  facets,  indicating  muitipie  and  heterogeneous  trust  reiationships 
between  users.  Here  is  a  muitifaceted  trust  exampie  from  Epinions.  Figure  1(a)  shows  singie  trust  reiationships  between  user  1 
and  his  20  friends.  Here,  we  can  see  that  user  7  is  the  more  trustabie  for  user  1 .  Figures  1  (b)  and  1  (c)  show  their  muitifaceted 
trust  reiationships  in  the  categories  “home  and  garden”  and  “restaurants”  respectiveiy.  For  the  category  “home  and  garden" 
user  7  is  not  necessary  the  most  trusted  friend  of  user  1 .  This  shows  that  trust  reiationships  in  different  categories  vary.  Thus, 
peopie  trust  others  differentiy  in  different  facets. 

(a)  Singie  Trust  (  b)  Trust  in  home  and  garden  (c)  Trust  in  restaurants 

Figure  1:  Singie  trust  and  muitifaceted  trust  reiationships  of  one  use  in  Epinions. 

(Note:  The  thickness  of  a  iine  indicates  the  ievei  of  trust.) 


There  are  two  chaiienges  to  study  in  obtaining  muitifaceted  trust  between  users:  first,  the  representation  of  muitipie  and 


heterogeneous  trust  relationships  between  users,  and  second,  estimating  the  strength  of  multifaceted  trust.  Traditionally,  trust  is 
represented  by  an  adjacency  matrix.  However,  this  cannot  capture  the  multifaceted  trust  relations.  We  developed  a  new 
algorithm,  mTrust,  which  extends  a  matrix  representation  to  a  tensor  representation,  adding  an  extra  dimension  for  facet 
description.  Previous  work  observed  a  strong  correlation  between  trust  and  user  similarity  in  the  context  of  rating  systems. 
Therefore,  it  is  reasonable  to  embed  trust  strength  inference  in  rating  prediction.  Thus,  to  evaluate  the  usefulness  of 
multifaceted  trust,  this  work  embeds  the  multifaceted  trust  inference  in  the  framework  of  rating  prediction. 

•  Key  Contributions:  Interesting  findings  from  the  experiments  are  that  (1)  more  than  20%  of  reciprocal  links  are 
heterogeneous,  (2)  more  than  14%  transitive  trust  relations  are  heterogeneous,  and  (3)  more  than  11%  of  cocitation  trust 
relations  are  heterogeneous.  With  these  findings,  mTrust  can  be  applied  to  many  online  tasks  such  as  improving  rating 
prediction,  enabling  facet-sensitive  ranking,  and  making  status  theory  applicable  to  reciprocal  links. 

Accomplishment  17:  “eTrust:  Understanding  Trust  Evolution  in  an  Online  World” 

•  Research  Problem  Studied:  Most  existing  research  about  online  trust  assumes  static  trust  relations  between  users.  As  we 
are  informed  by  social  sciences,  trust  evolves  as  humans  interact.  Little  work  exists  studying  trust  evolution  in  an  online  world. 
Researching  online  trust  evolution  faces  unique  challenges  because  more  often  than  not,  available  data  is  from  passive 
observation.  In  this  paper,  we  leverage  social  science  theories  to  develop  a  methodology  that  enables  the  study  of  online  trust 
evolution.  In  particular,  we  propose  a  framework  of  evolution  trust,  eTrust,  which  exploits  the  dynamics  of  user  preferences  in 
the  context  of  online  product  review.  We  present  technical  details  about  modeling  trust  evolution,  and  perform  experiments  to 
show  how  the  exploitation  of  trust  evolution  can  help  Improve  the  performance  of  online  applications  such  as  rating  and  trust 
prediction. 

•  Key  Contributions:  We  study  online  trust  evolution  in  the  context  of  product  review.  By  exploiting  the  correlation  between 
user  preferences  and  trust  relations,  we  propose  a  framework,  eTrust,  to  understand  the  evolution  of  trust  in  an  online  world 
and  apply  eTrust  to  various  online  applications  such  as  rating  prediction  and  trust  prediction.  Interesting  findings  are  observed 
in  our  experiments  using  real-world  data,  Epinlons;  and  eTrust  can  be  applied  to  Improve  the  performance  of  rating  prediction 
and  trust  prediction. 

Accomplishment  18:  “Minimizing  User  Vulnerability  and  Retaining  Social  Utility  in  Social  Media” 

•  Research  Problem  Studied:  Privacy  and  security  are  major  concerns  for  many  users  of  social  media.  When  users  share 
information  (e.g.,  data  and  photos)  with  friends,  they  can  make  their  friends  vulnerable  to  security  and  privacy  breach  with  dire 
consequences.  In  our  earlier  work,  we  show  that  it  is  feasible  to  measure  user  vulnerability  and  reduce  one's  vulnerability 
without  changing  the  structure  of  a  social  networking  site.  The  approach  is  to  unfriend  one's  most  vulnerable  friends.  However, 
when  such  a  vulnerable  friend  is  also  socially  Important,  unfrIending  him  would  significantly  reduce  one's  own  social  status.  In 
this  work,  we  address  the  problem  of  vulnerability  minimization  with  minimum  social  utility  losses.  This  work  extends  the 
existing  vulnerability  reduction  model  to  a  more  general  form.  Using  a  general  model,  we  formulate  the  two  discrete  optimization 
problems.  Both  problems  are  NP-hard. 

•  Key  Contributions:  We  formally  formulate  the  optimization  problem,  propose  an  approximation  algorithm  with  a  proven 
bound,  and  conduct  empirical  experiments  with  different  forms  of  social  utility  on  a  large-scale  Facebook  dataset  for 
performance  evaluation  and  comparison.  Our  work  differs  from  existing  work  addressing  social  net-  working  privacy.  Our 
approach  does  not  require  the  structural  change  of  a  social  networking  site  and  aims  to  maximally  reduce  a  user's  vulnerability 
while  minimizing  his  social  utility  loss. 
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Project  Description 

Social  media  is  gaining  popularity  in  recent  years  and  increasingly  becoming  an  integral  part  of 
our  life.  Given  the  extensiveness,  instantaneity,  and  diffusion  speed  of  social  media,  e.g.,  a  tweet 
or  a  clip  of  video,  could  galvanize  a  digital  revolution  or  wreak  havoc  with  one's  otherwise 
routine  and  uneventful  working  life.  With  the  presence  of  adversaries,  the  convenient  use  of  and 
low  barrier  of  social  media  brings  about  new  challenges.  How  well  we  address  these  challenges 
can  directly  influence  our  ability  to  manage  information  and  misinformation,  and  the  future  role 
of  social  media  as  a  reliable  communication  mechanism.  One  such  pressing  challenge  is  to 
assess  information  trustworthiness  in  social  media.  We  propose  to  investigate  research  issues 
related  to  social  media  trustworthiness  and  its  assessment  by  leveraging  social  research  methods, 
developing  new  computational  social  methods,  and  creating  novel  approaches  to  social  media 
data  collection  and  sharing. 

Research  Problem 

In  social  sciences,  trust  is  about  a  relationship  between  two  entities,  the  trustor  and  the  trustee. 
Trust  can  be  defined  as  the  perception  of  the  trustor  about  the  degree  to  which  the  trustee  would 
satisfy  an  expectation.  Trustworthiness  can  be  defined  from  the  perspective  of  both  entities;  in 
this  work,  it  is  the  perspective  of  the  trustor  that  defines  a  property  that  can  be  judged,  i.e.,  the 
amount  of  trust  associated  with  the  trustee.  In  all  cases  trust  is  a  heuristic  decision  rule,  allowing 
the  human  to  deal  with  complexities  that  would  require  unrealistic  effort  in  rational  reasoning. 
One  of  the  key  current  challenge  is  to  rethink  how  the  rapid  progress  of  technology  has  impacted 
trust  as  information  technology  has  significantly  changed  how  people  interact,  express 
themselves,  and  behave.  The  assessment  of  information  trustworthiness  in  social  media  requires 
answers  to  the  three  essential  questions  about  the  information:  (1)  source  (or  author),  (2)  author 
position,  and  (3)  content.  The  search  for  the  answers  is  greatly  complicated  by  the  nature  of 
social  media:  enormous  sizes  in  terms  of  users  and  links,  irregular  uses  of  languages,  incomplete 
sentences  or  messages,  and  inordinate  amounts  of  data  and  meta-data.  In  addition,  both  linked 
data  and  attribute  data  are  present  in  social  media.  The  former  represents  the  connectedness 
among  entities  and  the  latter  the  properties  of  entities.  In  search  of  the  three  answers,  we  face 
research  challenges: 


1 .  Information  Provenance  -  Identifying  the  true  source  (or  author)  of  information, 

2.  Friendship  Differentiation  -  Determining  if  the  author  is  a  friend,  acquaintance,  or  foe, 
and 

3.  Content  Analysis  -  Analyzing  the  content  to  ascertain  its  intention,  quality,  and  etc. 

In  this  project,  we  focus  on  developing  computational  social  theories  and  methods  for  the  first 
two  challenges.  The  third  challenge  is  partially  addressed  in  our  recent  work.  Additional  work  on 
trust  maintenance  can  be  found  in  literature. 
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Appendix  for  Scientific  Progress 

Accomplishment  1:  “Is  Distrust  the  Negation  of  Trust?  The  Value  of  Distrust  in  Social  Media” 

•  Research  Problem  Studied:  Trust  plays  an  important  role  in  helping  online  users  collect 
reliable  information,  and  has  attracted  increasing  attention  in  recent  years.  We  learn  from 
social  sciences  that,  as  the  conceptual  counterpart  of  trust,  distrust  could  be  as  important 
as  trust.  However,  little  work  exists  in  studying  distrust  in  social  media.  What  is  the 
relationship  between  trust  and  distrust?  Can  we  directly  apply  methodologies  from  social 
sciences  to  study  distrust  in  social  media?  In  this  paper,  we  design  two  computational 
tasks  by  leveraging  data  mining  and  machine  learning  techniques  to  enable  the 
computational  understanding  of  distrust  with  social  media  data.  The  first  task  is  to  predict 
distrust  from  only  trust,  and  the  second  task  is  to  predict  trust  with  distrust.  We  conduct 
experiments  in  real-world  social  media  data.  The  empirical  results  of  the  first  task 
provide  concrete  evidence  to  answer  the  question,  “is  distrust  the  negation  of  trust?” 
while  the  results  of  the  second  task  help  us  figure  out  how  valuable  the  use  of  distrust  in 
trust  prediction 

•  Key  Contributions:  As  informed  by  social  sciences,  distrust  could  be  as  important  as 
trust.  A  fundamental  problem  about  distrust  is  what  the  relation  between  trust  and  distrust 
is.  Passive  observation  is  the  modus  operandi  to  obtain  social  media  data,  which  lacks 
necessary  information  to  apply  methodologies  from  social  sciences  to  understand  distrust. 
However,  an  understanding  of  distrust  with  social  media  data  is  necessary  because  if 


distrust  is  the  negation  of  trust,  lacking  distrust  study  matters  little;  while  if  distrust  is  a 
new  dimension  of  trust,  ignoring  distrust  in  trust  study  may  yield  an  incomplete  and 
biased  estimate  of  the  effects  of  trust.  In  this  paper,  we  first  investigate  the  properties  of 
distrust  and  find  that  we  cannot  equally  and  conversely  extend  the  properties  of  trust  to 
distrust.  Then  we  then  design  two  tasks  by  leveraging  data  mining  and  machine  learning 
techniques  to  enable  a  computational  understanding  of  distrust  with  social  media  data. 
The  first  task  is  to  predict  distrust  with  only  trust  information,  and  the  second  task  is  to 
predict  trust  with  distrust  information.  We  conduct  experiments  in  real-world  social 
media  data.  The  evaluations  of  the  first  task  suggests  that  distrust  is  not  the  negation  of 
trust,  while  the  results  of  the  second  task  reveal  that  distrust  has  added  value  over  trust. 

Accomplishment  2:  “Scalable  Learning  of  Users'  Preferences  Using  Networked  Data” 

•  Research  Problem  Studied:  Users’  personal  information  such  as  their  political  views  is 
important  for  many  applications  such  as  targeted  advertisements  or  real-  time  monitoring 
of  political  opinions.  Huge  amounts  of  data  generated  by  social  media  users  present 
opportunities  and  challenge  to  study  these  preferences  in  a  large  scale.  In  this  paper,  we 
aim  to  infer  social  media  users’  political  views  when  only  network  information  is 
available.  In  particular,  given  personal  preferences  about  some  of  the  social  media  users, 
how  can  we  infer  the  preferences  of  unobserved  individuals  in  the  same  network?  There 
are  many  existing  solutions  that  address  the  problem  of  classification  with  networked 
data  problem.  However,  networks  in  social  media  normally  involve  millions  and  even 
hundreds  of  millions  of  nodes,  which  make  the  scalability  an  important  problem  in 
inferring  personal  preferences  in  social  media.  To  address  the  scalability  issue,  we  use 
social  influence  theory  to  construct  new  features  based  on  a  combination  of  local  and 
global  structures  of  the  network.  Then  we  use  these  features  to  train  classifiers  and 
predict  users’  preferences.  Due  to  the  size  of  real-world  social  networks,  using  the  entire 
net-  work  information  is  inefficient  and  not  practical  in  many  cases.  By  extracting  local 
social  dimensions,  we  present  an  efficient  and  seal-  able  solution.  Further,  by  capturing 
the  network’s  global  pattern,  the  proposed  solution,  balances  the  performance 
requirement  between  accuracy  and  efficiency 

•  Key  Contributions:  In  this  paper  we  studied  the  network-based  approach  of  inferring 
users’  personal  preferences.  We  categorized  the  network-based  algorithms  into  local  and 
global  algorithms.  Local  algorithms  use  users’  neighbors  to  predict  their  preferences, 
while  the  global  approaches  use  the  entire  network  information  to  predict  user’s 
preferences.  Our  experimental  results  show  that  local  algorithms  are  fast  and  scalable; 
however  they  need  large  amount  of  labeled  data  to  achieve  reasonable  prediction 
accuracy.  Further  their  prediction  accuracy  is  always  less  than  the  accuracy  of  global 
algorithms.  Global  algorithms,  in  contrast,  are  computationally  expensive,  but  perform 
well  even  in  cases  where  only  a  very  small  fraction  of  the  data  is  labeled.  We  proposed  a 
new  algorithm  called  LSocDim  based  on  social  influence  theory  to  bridge  the  efficiency 


of  local  algorithms  and  the  accuracy  of  global  algorithms.  The  experiments  show  the 
efficiency  and  the  effectiveness  of  the  proposed  algorithm.  In  particular,  we  show  that 
LSocDim  achieves  a  prediction  accuracy  near  to  that  of  the  state-of-the-art  global 
algorithm,  SoCDim,  while  decreasing  the  running  time  by  up  to  40  times. 

Accomplishment  3:  “Leveraging  Knowledge  across  Media  for  Spammer  Detection  in 

Microblogging” 

•  Research  Problem  Studied:  While  microblogging  has  emerged  as  an  important 
information  sharing  and  communication  platform,  it  has  also  become  a  convenient  venue 
for  spammers  to  overwhelm  other  users  with  unwanted  content.  Currently,  spammer 
detection  in  microblogging  focuses  on  using  social  networking  information,  but  little  on 
content  analysis  due  to  the  distinct  nature  of  microblogging  messages.  First,  label 
information  is  hard  to  obtain.  Second,  the  texts  in  microblogging  are  short  and  noisy.  As 
we  know,  spammer  detection  has  been  extensively  studied  for  years  in  various  media, 
e.g.,  emails,  SMS  and  the  web.  Motivated  by  abundant  resources  available  in  the  other 
media,  we  investigate  whether  we  can  take  advantage  of  the  existing  resources  for 
spammer  detection  in  microblogging.  While  people  accept  that  texts  in  microblogging  are 
different  from  those  in  other  media,  there  is  no  quantitative  analysis  to  show  how 
different  they  are.  In  this  paper,  we  first  perform  a  comprehensive  linguistic  study  to 
com-  pare  spam  across  different  media.  Inspired  by  the  findings,  we  present  an 
optimization  formulation  that  enables  the  design  of  spammer  detection  in  microblogging 
using  knowledge  from  external  media.  We  conduct  experiments  on  real-world  Twitter 
datasets  to  verify  (1)  whether  email,  SMS  and  web  spam  resources  help  and  (2)  how 
different  media  help  for  spammer  detection  in  microblogging. 

•  Key  Contributions:  Texts  in  microblogging  are  short,  noisy,  and  labeling  processing  is 
time-consuming  and  labor-intensive,  which  presents  great  challenges  for  spammer 
detection.  In  this  paper,  we  first  conduct  a  quantitative  analysis  to  study  how  noisy  the 
microblogging  texts  are  by  comparing  them  with  spam  messages  from  other  media.  The 
results  suggest  that  microblogging  data  is  not  significantly  different  from  data  from  the 
other  media.  Based  on  the  observations,  a  matrix  factorization  model  is  employed  to  learn 
lexicon  information  from  external  spam  resources.  By  incorporating  external  information 
from  other  media  and  content  information  from  microblogging,  we  propose  a  novel 
framework  for  spammer  detection.  The  experimental  results  demonstrate  the 
effectiveness  of  our  proposed  model  as  well  as  the  roles  of  different  types  of  information 
in  spammer  detection. 

Accomplishment  4:  “Online  Social  Spammer  Detection” 

•  Research  Problem  Studied:  The  explosive  use  of  social  media  also  makes  it  a  popular 
platform  for  malicious  users,  known  as  social  spammers,  to  overwhelm  normal  users  with 
unwanted  content.  One  effective  way  for  social  spammer  detection  is  to  build  a  classifier 


based  on  content  and  social  network  information.  However,  social  spammers  are 
sophisticated  and  adaptable  to  game  the  system  with  fast  evolving  content  and  network 
patterns.  First,  social  spammers  continually  change  their  spamming  content  patterns  to 
avoid  being  detected.  Second,  reflexive  reciprocity  makes  it  easier  for  social  spammers  to 
establish  social  influence  and  pretend  to  be  normal  users  by  quickly  accumulating  a  large 
number  of  “human”  friends.  It  is  challenging  for  existing  anti-spamming  systems  based 
on  batch-mode  learning  to  quickly  respond  to  newly  emerging  patterns  for  effective 
social  spammer  detection.  In  this  paper,  we  present  a  general  optimization  framework  to 
collectively  use  content  and  network  information  for  social  spammer  detection  and  pro¬ 
vide  the  solution  for  efficient  online  processing.  Experimental  results  on  Twitter  datasets 
confirm  the  effectiveness  and  efficiency  of  the  proposed  framework. 

•  Key  Contributions:  Social  spammers  are  sophisticated  and  adaptable  to  game  the  system 
by  continually  change  their  content  and  network  patterns.  To  handle  fast  evolving  social 
spammers,  we  proposed  to  use  online  learning  to  efficiently  reflect  the  newly  emerging 
patterns.  In  this  paper,  we  develop  a  general  social  spammer  detection  framework  with 
both  content  and  network  information,  and  provide  its  online  learning  updating  rules.  In 
particular,  we  use  directed  graph  Laplacian  to  model  social  network  information,  which  is 
further  integrated  into  a  matrix  factorization  framework  for  content  information 
modeling.  By  investigating  its  online  updating  scheme,  we  provide  an  efficient  way  for 
social  spammer  detection.  Experimental  results  show  that  our  proposed  method  is 
effective  and  efficient  comparing  with  other  social  spammer  detection  methods. 

Accomplishment  5:  “Mining  Social  Media  with  Social  Theories:  A  Survey” 

•  Research  Problem  Studied:  The  increasing  popularity  of  social  media  encourages  more 
and  more  users  to  participate  in  various  online  activities  and  produces  data  in  an 
unprecedented  rate.  Social  media  data  is  big,  linked,  noisy,  highly  unstructured  and  in¬ 
complete,  and  differs  from  data  in  traditional  data  mining,  which  cultivates  a  new 
research  field  -  social  media  mining.  Social  theories  from  social  sciences  are  helpful  to 
explain  social  phenomena.  The  scale  and  properties  of  social  media  data  are  very 
different  from  these  of  data  social  sciences  use  to  develop  social  theories.  As  a  new  type 
of  social  data,  social  media  data  has  a  fundamental  question  -  can  we  ap-  ply  social 
theories  to  social  media  data?  Recent  advances  in  computer  science  provide  necessary 
computational  tools  and  techniques  for  us  to  verify  social  theories  on  large-scale  social 
media  data.  Social  theories  have  been  applied  to  mining  social  media.  In  this  article,  we 
review  some  key  social  theories  in  mining  social  media,  their  verification  approaches, 
interesting  findings,  and  state-of-the-art  algorithms.  We  also  discuss  some  future 
directions  in  this  active  area  of  mining  social  media  with  social  theories. 

•  Key  Contributions:  The  social  nature  of  social  media  data  calls  for  new  techniques  and 
tools  and  cultivates  a  new  field  -  social  media  mining.  Social  theories  from  social 
sciences  have  been  proven  to  be  applicable  to  mining  social  media.  Integrating  social 


theories  with  computational  models  is  becoming  an  interesting  way  in  mining  social 
media  data  and  makes  exciting  progress  in  various  social  media  mining  tasks.  In  this 
article,  we  review  three  key  social  theories,  i.e.,  social  correlation  theory,  balance  theory 
and  status  theory,  in  mining  social  media  data.  In  detail,  we  introduce  basic  concepts, 
verification  methods,  interesting  findings  and  the  state-of  -  the-art  algorithms  to  exploit 
these  social  theories  in  social  media  mining  tasks,  which  can  be  categorized  to  feature 
engineering,  constraint  generating  and  objective  defining. 

Accomplishment  6:  “Social  Recommendation:  A  Review” 

•  Research  Problem  Studied:  Recommender  systems  play  an  important  role  in  helping 
online  users  find  relevant  information  by  suggesting  information  of  potential  interest  to 
them.  Due  to  the  potential  value  of  social  relations  in  recommender  systems,  social 
recommendation  has  attracted  increasing  attention  in  recent  years.  In  this  paper,  we 
present  a  review  of  existing  recommender  systems  and  discuss  some  research  directions. 
We  begin  by  giving  formal  definitions  of  social  recommendation  and  discuss  the  unique 
property  of  social  recommendation  and  its  implications  compared  with  those  of 
traditional  recommender  systems.  Then,  we  classify  existing  social  recommender  systems 
into  memory-based  social  recommender  systems  and  model-based  social  recommender 
systems,  according  to  the  basic  models  ad  opted  to  build  the  systems,  and  review 
representative  systems  for  each  category.  We  also  present  some  key  findings  from  both 
positive  and  negative  experiences  in  building  social  recommender  systems,  and  research 
directions  to  improve  social  recommendation  capabilities 

•  Key  Contributions:  Social  recommendation  has  attracted  broad  attention  from  both 
academia  and  industry,  and  many  social  recommender  systems  have  been  proposed  in 
recent  years.  In  this  paper,  we  first  give  a  narrow  definition  and  a  bro  ad  definition  of 
social  recommendation  to  cover  most  existing  definitions  of  social  recommendation  in 
literature,  and  discuss  the  unique  feature  of  social  recommender  systems  as  well  as  its 
implications.  We  classify  current  social  recommender  systems  into  memory-based  social 
recommender  systems  and  model-based  social  recommender  systems  according  to  the 
basic  models  chosen  to  build  the  systems,  and  then  present  a  re-  view  of  representative 
systems  for  each  category.  We  also  discuss  some  key  findings  from  positive  and  negative 
experiences  in  applying  social  recommender  systems.  Social  recommendation  is  still  in 
the  early  stages  of  development  and  needs  further  improvement.  Finally  we  present 
research  directions  that  can  potentially  improve  performance  of  social  recommender 
systems  including  exploiting  the  heterogeneity  of  social  networks  and  weak  dependence 
connections,  microcosmic  investigation  of  users  and  items,  considering  temporal 
information  in  rating  and  social  information,  understanding  the  role  of  negative  relations, 
and  integrating  cross-media  data. 

Accomplishment  7:  "Exploiting  Homophily  Effect  for  Trust  Prediction"(hTrust) 


•  Research  Problem  Studied:  Trust  plays  a  crucial  role  for  online  users  who  seek  reliable 
information.  However,  in  reality,  user-specified  trust  relations  are  very  sparse,  i.e.,  a  tiny 
number  of  pairs  of  users  with  trust  relations  are  buried  in  a  disproportionately  large 
number  of  pairs  without  trust  relations,  making  trust  prediction  a  daunting  task.  As  an 
important  social  concept,  however,  trust  has  received  growing  attention  and  interest. 
Social  theories  are  developed  for  understanding  trust.  Homophily  is  one  of  the  most 
important  theories  that  explain  why  trust  relations  are  established.  Exploiting  the 
homophily  effect  for  trust  prediction  provides  challenges  and  opportunities.  In  this  paper, 
we  embark  on  the  challenges  to  investigate  the  trust  prediction  problem  with  the 
homophily  effect.  First,  we  delineate  how  it  differs  from  existing  approaches  to  trust 
prediction  in  an  unsupervised  setting.  Next,  we  formulate  the  new  trust  prediction 
problem  into  an  optimization  problem  integrated  with  homophily,  empirically  evaluate 
our  approach  on  two  datasets  from  real-world  product  review  sites,  and  compare  with 
representative  algorithms  to  gain  a  deep  understanding  of  the  role  of  homophily  in  trust 
prediction. 

•  Key  Contributions:  In  this  paper,  we  study  the  problem  of  exploiting  homophily  effect 
for  trust  prediction.  First  we  conduct  experiments  on  datasets  from  real-world  product 
review  sites  to  demonstrate  the  existence  of  homophily  in  trust  relations.  Homophily 
regularization  is  then  introduced  to  capture  homophily  effect  in  trust  relations.  An 
unsupervised  framework  is  proposed,  incorporating  low-rank  matrix  factorization  and 
homophily  regularization.  Extensive  experiments  are  conducted  to  evaluate  the  proposed 
framework  on  real-world  trust  relation  datasets  and  the  experimental  results  demonstrate 
the  effectiveness  of  our  proposed  framework  as  well  as  the  role  of  homophily 
regularization  for  trust  prediction. 

Accomplishment  8:  "Exploiting  Focal  and  Global  Social  Context  for  Recommendation" 

•  Research  Problem  Studied:  With  the  fast  development  of  social  media,  the  information 
overload  problem  becomes  increasingly  severe  and  recommender  systems  play  an 
important  role  in  helping  online  users  find  relevant  information  by  suggesting 
information  of  potential  interests.  Social  activities  for  online  users  produce  abundant 
social  relations.  Social  relations  provide  an  independent  source  for  recommendation, 
presenting  both  opportunities  and  challenges  for  traditional  recommender  systems.  Users 
are  likely  to  seek  suggestions  from  both  their  local  friends  and  users  with  high  global 
reputations,  motivating  us  to  exploit  social  relations  from  local  and  global  perspectives 
for  online  recommender  systems  in  this  paper.  We  develop  approaches  to  capture  local 
and  global  social  relations,  and  propose  a  novel  frame-  work  EOCABAE  taking 
advantage  of  both  local  and  global  social  context  for  recommendation.  Empirical  results 
on  real-world  datasets  demonstrate  the  effectiveness  of  our  proposed  framework  and 
further  experiments  are  conducted  to  understand  how  local  and  global  social  context 
work  for  the  proposed  framework. 


•  Key  Contributions:  The  availability  of  social  relations  presents  both  challenges  and 
opportunities  for  traditional  recommender  systems.  In  this  paper,  we  investigate  how  to 
exploit  local  and  global  social  context  for  recommendation.  To  capture  local  social 
context,  we  force  that  the  user  preferences  of  two  socially  connected  users  are  correlated 
as  suggested  by  social  correlation  theories  and  we  also  study  the  connections  between  our 
proposed  approach  and  existing  approaches.  Ratings  from  users  with  high  reputations  are 
more  likely  to  be  trustworthy;  therefore,  to  capture  global  social  context,  we  use  user 
reputation  scores  to  weight  the  importance  of  their  ratings.  With  these  solutions,  we 
propose  a  framework  LOCABAL  to  integrate  local  and  global  social  context  for 
recommendation.  Experimental  results  on  real-world  data  sets  show  that  the  proposed 
framework  LOCABAL  outperforms  representative  social  recommender  systems.  Lurther 
experiments  are  conducted  to  understand  the  working  of  LOCABAL. 

Accomplishment  9:  "A  Tool  for  Collecting  Provenance  Data  in  Social  Media" 

•  Research  Problem  Studied:  In  recent  years,  social  media  sites  have  provided  a  large 
amount  of  information.  Recipients  of  such  information  need  mechanisms  to  know  more 
about  the  received  information,  including  the  provenance.  Previous  research  has  shown 
that  some  attributes  related  to  the  received  information  provide  additional  context,  so  that 
recipient  can  assess  the  amount  of  value,  trust,  and  validity  to  be  placed  in  the  received 
information.  Personal  attributes  of  a  user,  including  name,  location,  education,  ethnicity, 
gender,  and  political  and  religious  affiliations,  can  be  found  in  social  media  sites.  In  this 
paper,  we  present  a  novel  web-based  tool  for  collecting  the  attributes  of  interest 
associated  with  a  particular  social  media  user  related  to  the  received  information.  This 
tool  provides  a  way  to  combine  different  attributes  available  at  different  social  media 
sites  into  a  single  user  profile.  Using  different  types  of  Twitter  users,  we  also  evaluate  the 
performance  of  the  tool  in  terms  of  number  of  attribute  values  collected,  validity  of  these 
values,  and  total  amount  of  retrieval  time. 

•  Key  Contributions:  The  provenance  data  collector  tool  aims  to  collect  provenance 
attribute  values  of  a  user.  By  collecting  such  values  of  a  user  related  to  the  received 
information,  the  tool  could  facilitate  recipients  to  understand  more  about  the  received 
information.  Data  generated  on  social  media  sites  is  largely  distributed  and  unstructured 
in  nature.  The  proposed  tool  also  provides  a  way  to  combine  such  distributed  and 
unstructured  social  media  data. 

Accomplishment  10:  "Recovering  Information  Recipients  in  Social  Media  via  Provenance" 

•  Research  Problem  Studied:  In  recent  years,  social  media  has  changed  the  way  we 
interact  and  communicate.  Although  the  existing  structure  of  social  media  allows  users  to 
easily  create,  receive,  and  propagate  pieces  of  information,  many  a  time,  users  do  not 
have  background  knowledge  about  the  received  information,  including  the  provenance 
(sources  or  originators)  of  information,  and  other  recipients  who  may  have  retransmitted 


or  modified  the  information.  Providing  such  additional  context  to  the  received 
information  can  help  users  know  how  much  value,  trust,  and  validity  should  be  placed  in 
received  information.  To  judge  the  credibility  of  the  received  piece  of  information,  it  is 
vital  to  know  who  are  its  sources,  and  how  information  propagates  from  sources  to  other 
social  media  users.  In  this  paper,  we  are  studying  a  novel  research  problem  that  facilitates 
a  few  known  recipients  to  recover  other  unknown  recipients,  and  seek  the  provenance  of 
information.  The  experimental  results  with  Facebook  and  Twitter  datasets  show  that  the 
proposed  algorithm  is  effective  in  correctly  recovering  the  unknown  recipients  and 
seeking  the  provenance  of  information. 

•  Key  Contributions:  Social  media  allows  its  users  to  share  a  vast  amount  of  information 
with  other  users,  but  it  provides  no  mechanism  to  know  more  about  the  received 
information  for  its  users.  In  this  paper,  we  aim  to  recover  information  recipients  as  well 
as  seek  the  provenance  by  knowing  a  few  nodes  and  using  only  link  information  in  social 
networks.  Information  recipients  exist  along  the  paths  from  the  sources  to  the  known 
nodes.  In  this  paper  we  seek  the  information  propagation  flow  from  the  sources  to  the 
known  nodes,  and  recover  the  most  likely  information  recipients.  Using  the  experiment 
results  from  the  Facebook  and  Twitter  datasets,  we  show  that  the  proposed  algorithm  is 
effective  in  correctly  recovering  the  information  recipients  and  seeking  the  provenance  of 
information. 

Accomplishment  11:  "Context- Aware  Review  Helpfulness  Rating  Prediction" 

•  Research  Problem  Studied:  Online  reviews  play  a  vital  role  in  the  decision-making 
process  for  online  users.  Helpful  reviews  are  usually  buried  in  a  large  number  of 
unhelpful  reviews,  and  with  the  consistently  increasing  number  of  reviews,  it  becomes 
more  and  more  difficult  for  online  users  to  find  helpful  reviews.  Therefore  most  online 
review  websites  allow  online  users  to  rate  the  helpfulness  of  a  review  and  a  global 
helpfulness  score  is  computed  for  the  review  based  on  its  available  ratings.  However,  in 
reality,  user-specified  helpfulness  ratings  for  reviews  are  very  sparse  a  few  reviews 
attract  large  numbers  of  helpfulness  ratings  while  most  reviews  obtain  few  or  even  no 
helpfulness  ratings.  The  available  helpfulness  ratings  are  too  sparse  for  online  users  to 
assess  the  helpfulness  of  reviews.  Also  the  helpfulness  of  a  review  is  not  necessarily 
equally  useful  for  all  users  and  users  with  different  background  may  treat  the  helpfulness 
of  a  review  very  differently.  The  user  idiosyncrasy  of  review  helpfulness  motivates  us  to 
study  the  problem  of  review  helpfulness  rating  prediction  in  this  paper.  We  first  identify 
various  types  of  context  information,  model  them  mathematically,  and  propose  a  context- 
aware  review  helpfulness  rating  prediction  framework  CAP.  Experimental  results 
demonstrate  the  effectiveness  of  the  proposed  framework  and  the  importance  of  context 
awareness  in  solving  the  review  helpfulness  rating  prediction  problem. 

•  Key  Contributions:  In  this  paper  we  study  the  problem  of  review  helpfulness  rating 
prediction  by  exploiting  context  awareness  to  infer  unknown  helpfulness  ratings 


automatically,  motivated  by  the  fact  that  helpful  reviews  can  be  buried  in  large  amounts 
of  useless  reviews  and  the  user-specific  helpfulness  ratings  are  too  sparse  for  online  users 
to  assess  the  helpfulness  of  reviews.  We  first  show  that  the  problem  we  study  differs 
from  review  quality  prediction  problem  and  the  item  rating  prediction  problem.  We 
extract  four  types  of  social  context,  i.e.,  author  context,  rater  context,  connection  context 
and  preference  context,  formulate  them  mathematically,  and  propose  a  context-aware 
helpfulness  prediction  framework  CAP  which  exploits  content  context  and  various  types 
of  social  context.  Experimental  results  demonstrate  that  our  proposed  framework 
outperforms  the  state-of-the-art  baseline  methods  with  both  cold-start  and  warm-start 
settings,  and  further  experiments  are  conducted  to  understand  the  importance  of  context 
awareness  in  the  proposed  framework. 

Accomplishment  12:  "Seeking  Provenance  of  Information  in  Social  Media" 

•  Research  Problem  Studied:  Social  media  has  profoundly  impacted  the  way  people 
interact  and  communicate.  Social  media  propagates  breaking  news  and  disinformation 
alike  fast  and  on  an  unsurpassed  scale.  Because  of  its  democratization  nature,  social 
media  users  can  easily  produce,  receive  and  propagate  a  piece  of  information  without 
necessarily  providing  traceable  information.  Thus,  there  are  no  means  for  a  user  to  verify 
the  provenance  (also  known  as,  sources  or  originators)  of  information.  The 
disinformation  can  cause  tragic  consequences  to  society  and  individuals.  This  work  aims 
to  take  advantage  of  characteristics  of  social  media  to  provide  a  solution  to  the  problem 
of  lacking  traceable  information.  Such  knowledge  can  provide  additional  context  to  the 
received  information  such  that  a  user  can  assess  how  much  value,  trust,  and  validity 
should  be  placed  in  received  information.  In  this  paper,  we  are  studying  a  novel  research 
problem  that  facilitates  a  few  known  recipients  (less  than  1%  of  the  total  recipients)  to 
seek  the  provenance  of  information  by  recovering  how  it  has  own  from  its  originators. 
The  proposed  methodology  exploits  easily  computable  node  centralities  of  a  large  social 
media  network.  The  experimental  results  with  Facebook  and  Twitter  datasets  show  that 
the  proposed  mechanism  is  effective  in  correctly  identifying  the  additional  recipients  and 
seeking  the  provenance  of  information. 

•  Key  Contributions:  Social  media  allows  its  users  to  share  vast  amount  of  information 
with  other  users,  but  it  lacks  mechanisms  that  provide  traceable  knowledge  about  the 
received  information  for  its  users.  In  this  paper,  we  study  a  novel  research  problem  that 
facilitates  a  few  P-nodes  (less  than  1%  of  total  recipients)  to  seek  the  provenance  of 
information  by  identifying  how  it  has  own  from  its  originators.  To  this  end,  we  first 
formally  present  the  problem  and  provide  the  complexity  analysis.  Then,  use  the 
Facebook  and  Twitter  datasets  to  show  the  existence  of  two  hypotheses:  Degree 
Propensity  and  Closeness  Propensity.  The  proposed  methodology  then  exploits  these 
hypotheses  to  provide  not  only  the  critical  information  about  the  provenance,  but  also  the 
most  likely  provenance  paths.  Finally  using  the  experimental  results  with  the  Facebook 


and  Twitter  datasets,  we  show  that  the  proposed  algorithm  is  effective  in  correctly 
identifying  the  additional  transmitters,  and  seeking  the  provenance  of  information. 

Accomplishment  13:  "A  Tool  for  Assisting  Provenance  Search  in  Social  Media" 

•  Research  Problem  Studied:  In  recent  years,  social  media  sites  are  witnessing  an 
information  explosion.  Determining  the  reliability  of  such  a  large  amount  of  information 
is  a  major  area  of  research.  Information  provenance  (aka,  sources  or  origin)  provides  a 
way  to  measure  the  reliability  of  information  in  social  networks.  The  main  challenge  in 
seeking  provenance  is  the  availability  of  suitable  data  consisting  of  sufficient  unique 
propagation  paths.  Current  research  on  provenance  in  social  media  uses  synthetically 
generated  propagation  paths.  Although  these  proposed  approaches  are  theoretically 
significant,  it  is  still  a  challenge  to  apply  and  evaluate  them  on  social  media.  Hence, 
knowledge  of  the  actual  propagation  paths  for  a  piece  of  information  will  be  a  valuable 
asset  in  provenance  search.  This  paper  presents  a  tool  for  capturing  the  propagation 
network  of  a  given  tweet  or  URL  (Uniform  Resource  Locator)  in  the  Twitter  network. 
Researchers  can  use  this  tool  to  collect  information  propagation  data,  design  effective 
strategies  for  determining  the  provenance,  and  gain  information  about  the  tweet  such  as 
impact,  growth  rate  and  users  influencing  the  spread.  An  overview  of  the  user  interface 
and  the  architecture  of  the  system  is  provided.  Two  case  studies,  one  relating  to 
disinformation  in  riot  situations  and  another  on  corporate  involvement  in  education  has 
been  presented  to  demonstrate  the  effectiveness  of  the  system  for  seeking  provenance 
information. 

•  Key  Contributions:  The  paper  presents  a  tool  to  obtain  the  spread  of  a  given  tweet  or 
URL  on  the  twitter  network.  The  tool  presents  researchers  with  a  propagation  network  to 
assist  in  seeking  the  provenance  path  of  a  given  tweet.  The  provenance  path  gives 
additional  information  to  assess  the  reliability  of  a  given  piece  of  data  in  social  media. 

Accomplishment  14:  "Provenance  Data  in  Social  Media" 

•  Book  Overview:  Social  media  shatters  the  barrier  to  communicate  anytime  anywhere 
for  people  of  all  walks  of  life.  The  publicly  available,  virtually  free  information  in 
social  media  poses  a  new  challenge  to  consumers  who  have  to  discern  whether  a 
piece  of  information  published  in  social  media  is  reliable.  For  example,  it  can  be 
difficult  to  understand  the  motivations  behind  a  statement  passed  from  one  user  to 
another,  without  knowing  the  person  who  originated  the  message.  Additionally,  false 
information  can  be  propagated  through  social  media,  resulting  in  embarrassment  or 
irreversible  damages.  Provenance  data  associated  with  a  social  media  statement  can 
help  dispel  rumors,  clarify  opinions,  and  confirm  facts.  However,  provenance  data 
about  social  media  statements  is  not  readily  available  to  users  today.  Currently, 
providing  this  data  to  users  requires  changing  the  social  media  infrastructure  or 
offering  subscription  services.  Taking  advantage  of  social  media  features,  research  in 


this  nascent  field  spearheads  the  search  for  a  way  to  provide  provenance  data  to  social 
media  users,  thus  leveraging  social  media  itself  by  mining  it  for  the  provenance  data. 
Searching  for  provenance  data  reveals  an  interesting  problem  space  requiring  the 
development  and  application  of  new  metrics  in  order  to  provide  meaningful 
provenance  data  to  social  media  users.  This  lecture  reviews  the  current  research  on 
information  provenance,  explores  exciting  research  opportunities  to  address  pressing 
needs,  and  shows  how  data  mining  can  enable  a  social  media  user  to  make  informed 
judgements  about  statements  published  in  social  media. 

•  Table  of  Contents:  Information  Provenance  in  Social  Media  /  Provenance  Attributes  / 
Provenance  via  Network  Information  /  Provenance  Data 

Accomplishment  15:  "User  Vulnerability  and  its  Reduction  on  a  Social  Networking  Site" 

•  Research  Problem  Studied:  Privacy  and  security  are  major  concerns  for  many  users  of 
social  media.  When  users  share  information  (e.g.,  data  and  photos)  with  friends,  they 
can  make  their  friends  vulnerable  to  security  and  privacy  breaches  with  dire 
consequences.  With  the  continuous  expansion  of  a  user’s  social  network,  privacy 
settings  alone  are  often  inadequate  to  protect  user’s  profile.  In  this  research,  we  aim  to 
address  some  critical  issues  related  to  privacy  protection:  (1)  How  can  we  measure 
and  assess  individual  user’s  vulnerability?  (2)  With  the  diversity  of  one’s  social 
network  friends,  how  can  one  figure  out  an  effective  approach  to  maintaining  balance 
between  vulnerability  and  social  utility?  In  this  work,  first  we  present  a  novel  way  to 
define  vulnerable  friends  from  an  individual  user’s  perspective.  User  vulnerability  is 
dependent  on  whether  or  not  the  user’s  friends’  privacy  settings  protect  the  friend  and 
the  individual’s  network  of  friends  (which  includes  the  user).  We  show  that  it  is 
feasible  to  measure  and  assess  user  vulnerability,  and  reduce  one’s  vulnerability 
without  changing  the  structure  of  a  social  networking  site.  The  approach  is  to 
unfriend  one’s  most  vulnerable  friends.  However,  when  such  a  vulnerable  friend  is 
also  socially  important,  unfriending  him  would  significantly  reduce  one’s  own  social 
status.  We  formulate  this  novel  problem  as  vulnerability  minimization  with  social 
utility  constraints.  We  formally  define  the  optimization  problem,  and  provide  an 
approximation  algorithm  with  a  proven  bound.  Finally,  we  conduct  a  large-scale 
evaluation  of  new  framework  using  a  Facebook  dataset.  We  resort  to  experiments  and 
observe  how  much  vulnerability  an  individual  user  can  decrease  by  unfriending  a 
vulnerable  friend.  We  compare  performance  of  different  unfriending  strategies  and 
discuss  the  security  risk  of  new  friend  request.  Additionally,  by  employing  different 
forms  of  social  utility,  we  confirm  that  balance  between  user  vulnerability  and  social 
utility  can  be  practically  achieved. 

•  Key  Contributions:  We  propose  a  feasible  approach  to  a  novel  problem  of  identifying 
a  user’s  vulnerable  friends  on  a  social  networking  site.  Our  work  differs  from  existing 
work  addressing  social  networking  privacy  by  introducing  a  vulnerability-centered 


approach  to  a  user  security  on  a  social  networking  site.  On  most  social  networking 
sites,  privacy  related  efforts  have  been  concentrated  on  protecting  individual 
attributes  only.  However,  users  are  often  vulnerable  through  community  attributes. 
Unfriending  vulnerable  friends  can  help  protect  users  against  the  security  risks.  Based 
on  our  study  of  over  2  million  users,  we  find  that  users  are  either  not  careful  or  not 
aware  of  security  and  privacy  concerns  of  their  friends.  Our  model  clearly  highlights 
the  impact  of  each  new  friend  on  a  user’s  privacy.  Our  approach  does  not  require  the 
structural  change  of  a  social  networking  site  and  aims  to  maximally  reduce  a  user’s 
vulnerability  while  minimizing  his  social  utility  loss.  The  work  formulates  a  novel 
problem  of  constrained  vulnerability  reduction  suggests  a  feasible  approach,  and 
demonstrates  that  the  problem  of  constrained  vulnerability  reduction  is  solvable. 

Accomplishment  16:  “mTrust:  Discerning  Multi-Faceted  Trust  in  a  Connected  World” 

•  Research  Problem  Studied:  The  issue  of  trust  has  attracted  increasing  attention  from  the 
community  of  social  media  research.  Trust,  as  a  social  concept,  naturally  has  multiple 
facets,  indicating  multiple  and  heterogeneous  trust  relationships  between  users.  Here  is  a 
multifaceted  trust  example  from  Epinions.  Figure  1(a)  shows  single  trust  relationships 
between  user  1  and  his  20  friends.  Here,  we  can  see  that  user  7  is  the  more  trustable  for 
user  1.  Figures  1(b)  and  1(c)  show  their  multifaceted  trust  relationships  in  the  categories 
“home  and  garden”  and  “restaurants”  respectively.  For  the  category  “home  and  garden" 
user  7  is  not  necessary  the  most  trusted  friend  of  user  1 .  This  shows  that  trust 
relationships  in  different  categories  vary.  Thus,  people  trust  others  differently  in  different 
facets. 


(a)  Single  Trust  (  b)  Trust  in  home  and  garden  (c)  Trust  in  restaurants 

Figure  1:  Single  trust  and  multifaceted  trust  relationships  of  one  use  in  Epinions. 
(Note:  The  thickness  of  a  line  indicates  the  level  of  trust.) 


There  are  two  challenges  to  study  in  obtaining  multifaceted  trust  between  users:  first,  the 
representation  of  multiple  and  heterogeneous  trust  relationships  between  users,  and 
second,  estimating  the  strength  of  multifaceted  trust.  Traditionally,  trust  is  represented  by 


an  adjacency  matrix.  However,  this  cannot  capture  the  multifaceted  trust  relations.  We 
developed  a  new  algorithm,  mTrust,  which  extends  a  matrix  representation  to  a  tensor 
representation,  adding  an  extra  dimension  for  facet  description.  Previous  work  observed  a 
strong  correlation  between  trust  and  user  similarity  in  the  context  of  rating  systems. 
Therefore,  it  is  reasonable  to  embed  trust  strength  inference  in  rating  prediction.  Thus,  to 
evaluate  the  usefulness  of  multifaceted  trust,  this  work  embeds  the  multifaceted  trust 
inference  in  the  framework  of  rating  prediction. 

•  Key  Contributions:  Interesting  findings  from  the  experiments  are  that  (1)  more  than  20% 
of  reciprocal  links  are  heterogeneous,  (2)  more  than  14%  transitive  trust  relations  are 
heterogeneous,  and  (3)  more  than  11%  of  cocitation  trust  relations  are  heterogeneous. 
With  these  findings,  mTrust  can  be  applied  to  many  online  tasks  such  as  improving  rating 
prediction,  enabling  facet-sensitive  ranking,  and  making  status  theory  applicable  to 
reciprocal  links. 

Accomplishment  17:  “eTrust:  Understanding  Trust  Evolution  in  an  Online  World” 

•  Research  Problem  Studied:  Most  existing  research  about  online  trust  assumes  static  trust 
relations  between  users.  As  we  are  informed  by  social  sciences,  trust  evolves  as  humans 
interact.  Little  work  exists  studying  trust  evolution  in  an  online  world.  Researching  online 
trust  evolution  faces  unique  challenges  because  more  often  than  not,  available  data  is 
from  passive  observation.  In  this  paper,  we  leverage  social  science  theories  to  develop  a 
methodology  that  enables  the  study  of  online  trust  evolution.  In  particular,  we  propose  a 
framework  of  evolution  trust,  eTrust,  which  exploits  the  dynamics  of  user  preferences  in 
the  context  of  online  product  review.  We  present  technical  details  about  modeling  trust 
evolution,  and  perform  experiments  to  show  how  the  exploitation  of  trust  evolution  can 
help  improve  the  performance  of  online  applications  such  as  rating  and  trust  prediction. 

•  Key  Contributions:  We  study  online  trust  evolution  in  the  context  of  product  review.  By 
exploiting  the  correlation  between  user  preferences  and  trust  relations,  we  propose  a 
framework,  eTrust,  to  understand  the  evolution  of  trust  in  an  online  world  and  apply 
eTrust  to  various  online  applications  such  as  rating  prediction  and  trust  prediction. 
Interesting  findings  are  observed  in  our  experiments  using  real-world  data,  Epinions;  and 
eTrust  can  be  applied  to  improve  the  performance  of  rating  prediction  and  trust 
prediction. 

Accomplishment  18:  “Minimizing  User  Vulnerability  and  Retaining  Social  Utility  in  Social 

Media” 

•  Research  Problem  Studied:  Privacy  and  security  are  major  concerns  for  many  users  of 
social  media.  When  users  share  information  (e.g.,  data  and  photos)  with  friends,  they  can 
make  their  friends  vulnerable  to  security  and  privacy  breach  with  dire  consequences.  In 
our  earlier  work,  we  show  that  it  is  feasible  to  measure  user  vulnerability  and  reduce 
one's  vulnerability  without  changing  the  structure  of  a  social  networking  site.  The 


approach  is  to  unfriend  one's  most  vulnerable  friends.  However,  when  such  a  vulnerable 
friend  is  also  socially  important,  unfriending  him  would  significantly  reduce  one's  own 
social  status.  In  this  work,  we  address  the  problem  of  vulnerability  minimization  with 
minimum  social  utility  losses.  This  work  extends  the  existing  vulnerability  reduction 
model  to  a  more  general  form.  Using  a  general  model,  we  formulate  the  two  discrete 
optimization  problems.  Both  problems  are  NP-hard. 

Key  Contributions:  We  formally  formulate  the  optimization  problem,  propose  an 
approximation  algorithm  with  a  proven  bound,  and  conduct  empirical  experiments  with 
different  forms  of  social  utility  on  a  large-scale  Facebook  dataset  for  performance 
evaluation  and  comparison.  Our  work  differs  from  existing  work  addressing  social  net¬ 
working  privacy.  Our  approach  does  not  require  the  structural  change  of  a  social 
networking  site  and  aims  to  maximally  reduce  a  user's  vulnerability  while  minimizing  his 
social  utility  loss. 


